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[l] We describe a new algorithm to retrieve SO 2 from 
satellite-measured hyperspectral radiances. We employ the 
principal component analysis technique in regions with no 
significant SO 2 to capture radiance variability caused by both 
physical processes (e.g., Rayleigh and Raman scattering and 
ozone absorption) and measurement artifacts. We use the 
resulting principal components and SO 2 Jacobians calculated 
with a radiative transfer model to directly estimate SO 2 
vertical column density in one step. Application to the 
Ozone Monitoring Instrument (OMI) radiance spectra in 
3 10.5-340 mn demonstrates that this approach can greatly 
reduce biases in the operational OMI product and decrease 
the noise by a factor of 2, providing greater sensitivity 
to anthropogenic emissions. The new algorithm is fast, 
eliminates the need for instrument-specific radiance correction 
schemes, and can be easily adapted to other sensors. These 
attributes make it a promising technique for producing long- 
term, consistent SO 2 records for air quality and climate 
research. Citation: Li, C., J. Joiner, N. A. Krotkov, and 
P. K. Bhartia (2013), A fast and sensitive new satellite S0 2 
retrieval algorithm based on principal component analysis: 
Application to the ozone monitoring instrument, Geophys. Res. 
Lett., 40, doi:10.1002/2013GL058134. 

1. Introduction 

[ 2 ] Sulfur dioxide (SO 2 ) is an important pollutant gas that 
can have profound impacts on the Earth’s enviromnent. It is a 
designated criteria air pollutant in many countries, and also a 
precursor of sulfate aerosols that can significantly affect air 
quality and climate [e.g., Charlson et al., 1992], With a rela- 
tively short atmospheric lifetime, the average surface concen- 
tration of SO 2 spans several orders of magnitude between 
polluted and pristine regions [Chin et al., 2000]. On the other 
hand, from time to time, sizable transient S0 2 plumes can 
travel into remote oceanic areas [e.g., Hsu et al., 2012], 
Given this large inhomogeneity in its distribution, it is imper- 
ative to develop capabilities of measuring SO 2 globally with 
good accuracy and precision over relatively small spatial and 
temporal scales. 


Additional supporting information may be found in the online version of 
this article. 

1 Earth System Science Interdisciplinary Center, University of Maryland, 
College Park, Maryland, USA. 

2 NASA Goddard Space Flight Center, Greenbelt, Maryland, USA. 

Corresponding author: C. Li, Earth System Science Interdisciplinary 
Center, University of Maryland, College Park, MD 20742, USA. 

(can. li@nasa. gov) 

©2013. American Geophysical Union. All Rights Reserved. 

0094-8276/1 3/10.1 002/20 1 3GL05 8 134 


[3] Satellite measurements of global SO 2 pollution have un- 
dergone substantial improvements over the past 10-15 years 
owing to the launch of several hyperspectral UV -Visible in- 
struments. Among them is the Ozone Monitoring Instrument 
(OMI), a Dutch-Finnish sensor flying on NASA’s Aura space- 
craft that provides daily global coverage at high spatial resolu- 
tion (13 x 24 km 2 at nadir) [Levelt et al., 2006], The 
operational OMI level-2 (L2) planetary boundary layer (PBL) 
S0 2 data are produced using the Band Residual Difference 
(BRD) method that utilizes three selected wavelength pairs to 
maximize sensitivity to PBL pollution [Krotkov et al., 2006]. 
While useful for monitoring strong anthropogenic sources 
[e.g., Fioletov et al., 2011; Li et al., 2010], the OMI PBL 
S0 2 product suffers from the effects of random instrument noise 
as well as systematic biases [e.g., Lee et al., 2009]. A back- 
ground correction and multiyear pixel averaging can help to 
mitigate these issues but may introduce new biases and restrict 
the time resolution of data analyses [Streets et al, 2013]. Other 
methods, such as the Iterative Spectral Fitting (ISF) algorithm 
[Yang et al., 2009], have had some success improving the qual- 
ity of OMI S0 2 retrievals [e.g., He et al., 2012], Operational 
implementation of the ISF algorithm, however, has proved dif- 
ficult owing to the amount of computation involved in the radi- 
ative transfer calculations for many wavelengths, and the 
empirical corrections required to remove retrieval artifacts. 

[4] In this study, we introduce a fundamentally different 
approach to retrieve SO 2 from OMI-measured radiance and 
irradiance data. Our method is based on principal component 
analysis (PCA), a statistical technique often employed to 
reduce dimensionality while retaining the information content 
of a multivariate data set, by transforming it into a subspace 
spanned by a set of orthogonal vectors (principle components, 
PCs). PCA has been applied to compress data and retrieve 
temperature and moisture profiles from high-resolution infra- 
red satellite instruments [e.g., Huang and Antonelli, 2001], 
Guanter et al. [2012] and Joiner et al. [2013] used PCA-based 
approaches to retrieve terrestrial chlorophyll fluorescence from 
satellite and ground-based spectral data. As demonstrated 
below, our algorithm shares a similar general framework with 
these approaches and can significantly improve the quality of 
OMI SO 2 retrievals as compared with the current operational 
PBL product. 

2. Methodology 

2.1. General Framework 

[5] To illustrate our approach, we start from the widely used 
differential optical absorption spectroscopy (DOAS) method 
for trace gas retrievals. If there are n gases with absorption cross 
sections tr^J) at a given wavelength /, the Sun-normalized 
Earthshine radiance at the top of the atmosphere (TOA), 
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7(l)//fl(l), can be modeled with the weak absorption Beer- 
Lambert law [e.g., Platt and Stutz, 2008] as 

in (j^j = -ts g a g (X) - P{X) - RRS(2), (1) 

where 7(2) and 7 0 (2) are the Earthshine radiance and solar 
irradiance at TOA, respectively, S g is the number density 
of gas g along the optical path (slant column density, SCD), 
P(k) is a polynomial term representing broadband effects in- 
cluding atmospheric Rayleigh and aerosol/cloud Mie scatter- 
ing and surface reflectance, and RRS(1) is a term to account 
for the rotational-Raman scattering (also known as the Ring 
effect). S g can be estimated through least squares fitting that 
minimizes the differences between the measured and 
modeled radiance spectra (i.e., left- and right-hand sides of 
equation (1)). It may then be converted to a vertical column 
density (17, or VCD) with an estimate of the air mass factor 
(AMF). The AMF is typically calculated at a single wave- 
length based on a prescribed vertical profile of gas g along 
with other assumptions. 

[6] Uncertainties in the DO AS fitting can arise from inaccu- 
rate modeling of the various physical processes in equation (1) 
as well as artifacts in the radiance measurements (e.g., stray 
light). For example, the rotational-Raman effect is very diffi- 
cult to model accurately in the S02-relevant spectral window 
since it involves the filling-in of both telluric and solar lines 
and is sensitive to cloud properties. The measurement artifacts 
often require the addition of an effective absorber term in the 
fitting, but modeling of them can also be quite complicated 
and may or may not fit the formulation in equation (1). As with 
the DOAS method, the BRD and ISF algorithms also rely on 
empirical, instrument-specific corrections to the radiance data 
in order to reduce retrieval noise and biases. 

[7] Instead of attempting to model all these various factors, 
we propose to replace them with characteristic features 
derived directly from the measured Sun-normalized radi- 
ances. In this algorithm, the PCA technique is applied to 
the radiance data to extract a set of PCs that capture most of 
measurement-to-measurement variation of the radiances 
(in the absence of the signal of interest). For our problem, 
we may use data from a region presumed free of S0 2 (e.g., 
the equatorial Pacific). Then, the derived PCs will capture 
physical and measurement details other than those associated 
with S0 2 absorption. The PCs are ordered so that the first PC 
explains the most of variance, the second PC explains the 
second most of variance, and so on. A set of n v PCs (v,) can 
be used along with the sensitivity of the radiances to the S0 2 
column (S0 2 Jacobians, 8N/d Qso 2 ) to form a forward model: 

«, dN 

N(a>, Qso, ) = Z + Dso 2 ^ — , (2) 

i=i o!2so 2 

where Vis a measured V value spectrum (N(k) = — 1 00 x logio 
(7(2)/7o(2)). For polluted regions with actual S0 2 signals, the 
forward model can be inverted through standard least squares 
fitting to simultaneously retrieve the VCD of S0 2 (D S02 ) and 
the coefficients of the PCs (co). Note that an assumption here is 
that a linear combination of PCs calculated from S02-free re- 
gions can well describe the non-SO? affected radiances in 
SC> 2 -polluted areas. In most cases this assumption should hold 
true given the relatively weak absorption by S0 2 outside of 


polluted regions. The use of S0 2 Jacobians for the entire fitting 
window also removes the step for converting SCD to VCD 
using an AMF. 

2.2. Application to the OMI Instrument 

[s] OMI level 1 B (L 1 B) radiance and irradiance data in the 
spectral window of 3 10.5-340 nm were used in this study, to- 
gether with the VCD of 0 3 (£2 03 ) from the L2 OMT03 product 
[Bhartia and Wellemeyer, 2002], This spectral window 
includes the strong S0 2 absorption band at 310.8 nm 
and minimizes potential interferences due to stray light 
at shorter wavelengths. Our experiments also showed that 
the inclusion of wavelengths > 340 nm had no discernible 
impacts on retrievals. To better account for the orbit-to- 
orbit measurement artifacts, we analyzed data from one 
orbit at a time. Because the 60 cross-track positions 
(rows) of OMI are individual detectors (and essentially 
different instruments), we also treated each row of each 
orbit separately and filtered out pixels with slant column 
0 3 (S 03 ) > 1500 DU (Dobson unit, 1 DU = 2.69x 10 16 
molecules/cm 2 ); large ,S' ()3 can diminish the measurement 
sensitivity to S0 2 . 5o 3 was calculated from 0 03 , the solar 
zenith angle ( 0 O ), and the viewing zenith angle (60, 

S 0 , = Oo 3 ( sec(# 0 ) + sec(0)). (3) 

[9] After data screening, about 900-1 300 pixels of various 
cloud fractions remained in each row for the PCA. We tested 
a few different sets of input spectra for generating the PCs: 

( 1 ) the V value spectra, (2) the V value spectra nonnalized 
against 340 nm, and (3) the V value spectra after a fitted sec- 
ond-order polynomial were subtracted from each spectrum. 
As the retrievals of S0 2 were generally very similar for these 
different PCAs, hereafter we focus on the first method. 

[10] Given the presence of transient S0 2 plumes, one 
challenge is how to differentiate between S02-free and 
SC> 2 -polluted regions. We note that for the vast majority 
of pixels, S0 2 absorption is normally not strong enough to 
cause significant changes in the radiances. It is thus unlikely 
for the PC(s) associated with or affected by SC) 2 absorption 
(vsoc) to be among the first few leading PCs, even if PCA is 
conducted on an entire row without first screening out pol- 
luted scenes. As long as n v is sufficiently small to exclude 
Vso 2 from equation (2), reasonable initial estimates of 
S0 2 (Oso 2 ini) can be obtained. A second step PCA can then 
be applied to pixels with small £2 so 2 ini (in this study the 
threshold was set at ±1.5 standard deviations for each 
orbit/row) to extract a new set of PCs to update equation 

(2) , followed by updated retrievals of S0 2 . This step can be 
repeated. We found that the changes in the retrieved S0 2 gen- 
erally became very small within two iterations. We conducted 
the second step PCA and retrievals for three segments of each 
row: a “tropical” region with S 03 <100 DIJ + min(.S' 03 ), and 
two regions north and south of it. The resulting PCs for each 
segment more closely matched the measurements than the 
PCs acquired using the entire row. The use of these regionally 
derived PCs reduced retrieval biases. 

[ 11 ] Another important consideration is how to determine 
n v , the number of PCs to use in equation (2). Too few PCs 
will lead to large biases in S0 2 while too many may cause 
over fitting. Our test results indicated that in most cases, at 
least 20-30 PCs were necessary, while occasionally in the 
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Figure 1. (a) The first PC extracted from the radiance data 
from row 1 1 of orbit 10,990, which passed over the Pacific on 
08 August 2006. (b) The second and third PCs from the analysis, 
(c) The fourth and fifth PCs from the analysis, (d) The fitting 
residuals for a pixel near Hawaii presumably influenced 
by a volcanic plume. The red and black lines represent the 
fitting residuals with and without the S0 2 absorption term in 
equation (2), respectively. The estimated S0 2 VCD from the 
fitting is 2.21 DU. The green lines in Figures la, lb, and Id 
show the mean N value spectrum of the row, the 0 3 and S0 2 
cross sections both at 243 K, respectively. All units are in 
N values unless otherwise specified. 

presence of relatively strong S0 2 signals, no more than 8 PCs 
could be used. Instead of using a constant n v , we determined 
it for each row by checking the correlation between PCs 
(after the fifth) and the S0 2 Jacobians. For example, if signif- 
icant correlation at the 95% confidence level existed between 
the zth PC and S0 2 Jacobians, only the preceding z'-l PCs 
would be included. We found this to be an effective way to 
prevent the inclusion of Vso 2 and collinearity in equation 
(2). To maintain computational efficiency, an upper limit of 
30 was set for n v . The differences in S0 2 due to the use of a 
greater upper limit (e.g., 50) were found to be marginal, espe- 
cially for polluted areas. 

[12] The VLIDORT radiative transfer code [Spurr, 2008] 
was employed to calculate S0 2 Jacobians. To facilitate the 
comparison between the new algorithm and the operational 
PBL product, we used the same fixed atmospheric profiles 
as in the operational algorithm, and also assumed the same 
surface albedo (0.05), surface pressure (1013.25 hPa), fixed 
solar zenith angle (30°), and viewing zenith angle (0°). For 
S0 2 , a climatological profile over the summertime eastern 


U.S. was used. For O3 and temperature, the OMT03 standard 
midlatitude profiles with Q 03 = 325 DU were used. Details 
can be found in Krotkov et al. [2006]. In the future, we plan 
to expand the look-up table for S0 2 Jacobians to more realis- 
tically account for different measurement conditions. It 
should also be noted that while the PCA was conducted for 
pixels of all-sky conditions, we focus on relatively cloud-free 
scenes in the following sections, given that the calculated 
S0 2 Jacobians are not suitable for cloudy conditions. 


3. Results 

[13] As an example, Figures la-lc show the typical first 
few leading PCs extracted from the N value spectra of an en- 
tire row. The first PC essentially represents the mean spec- 
trum of all the pixels. The second PC closely follows the 
spectral feature of the 0 3 cross section, suggesting that 0 3 
absorption is a dominant contributor to the variance in the 
window. The third PC may be related to the surface contribu- 
tion. It is difficult to assign a well-defined geophysical mean- 
ing to the fourth, the fifth, and the following PCs, but they 
probably reflect the rotational-Raman effect or various mea- 
surement artifacts such as the wavelength shift between radi- 
ance and irradiance spectra. The residuals from two different 
least squares fittings for a pixel near Hawaii are also shown in 
Figure Id. While the same set of 30 PCs were used in both 
fittings, only one (red line) included the S0 2 Jacobians. As 
can be seen from the figure, the inclusion of S0 2 Jacobians 
had little effects at wavelengths > 320 nm, but substantially 
reduced the residuals in the strong S0 2 absorption bands at 
310.8 and 313 nm. The initial estimate of S0 2 in the pixel 
was 2.21 DU, implying the influence of a nearby volcano. 

[14] Figure 2 compares the global monthly mean S0 2 for 
August 2006 from the PCA algorithm and the operational 
OMI L2 PBL S0 2 product. The new algorithm largely re- 
duces the systematic biases in the operational data, removing 



Figure 2. (a) Monthly mean S0 2 for August 2006 retrieved 
using the PCA algorithm. Data were gridded to 0.25° x 0.25°. 
Pixels outside the center 50 rows, or with radiative cloud frac- 
tion >0.3 or slant column 0 3 > 1500 DU were excluded. 
Gray-shaded grid cells have less than five measurements during 
the month, (b) Same as in Figure 2a but for the operational OMI 
L2 PBL S0 2 data. 
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Figure 3. (a) The monthly mean SO 2 for August 2006 over the eastern U.S. retrieved using the PCA algorithm. Solid circles 
mark the locations of some major SO 2 point sources (> 70 kt/yr). (b) Same as in Figure 3a but for the operational OMI L2 
PBL SO 2 product. Smaller stationary SO 2 sources may also be detected by the PCA algorithm, but likely will require data 
averaging over a longer period of time. 


the step changes along 30°N and 30°S (probably related to 
the O3 profile shape change in the 0MT03 algorithm), the 
positive values over the Tibet Plateau and the Rocky 
Mountains, and also the large negative values at higher 
latitudes. Meanwhile, the major known SO 2 source regions 
including eastern China, the eastern U.S., Mexico City, the 
industrial region in South Africa, as well as various degassing 
volcanoes are clearly discernible in the new retrievals. The 
S0 2 plume in the South Pacific (20°S, 170°W) was from the 
submarine eruption of the Home Reef volcano in Tonga that 
started on 07 August 2006. A close-up look at the eastern U.S. 
(Figure 3) further reveals the improvements made in the 
new algorithm. With reduced noise and biases, the large point 
sources in the region, such as the power plants in the Ohio 
River valley, Atlanta, and mid-Atlantic coast can be more 
clearly distinguished. More regional examples are provided 
in the supporting infonnation. In some cases, the PCA algo- 
rithm may potentially be employed to monitor SO 2 pollution 
at higher temporal resolutions, as shown in the daily and 
weekly SO 2 maps also available in the supporting information. 

[15] The mean and standard deviation of the PCA SO2 and 
the operational OMI PBL SO 2 were calculated for the equato- 
rial Pacific (10° S-10°N, 120°W-150°W) to compare the noise 
levels of the two retrievals (Table 1). For this presumably SO 2 - 
free region, the standard deviation of PCA-retrieved SO 2 is 
~0.5 DU, half that of the operational OMI product (-1.0 DU). 
The day-to-day variation of the mean PCA-retrieved SO 2 over 
the region (between —0.03 and 0.02 DU) is also smaller than 
that of the operational product (between —0.14 and 0.09 DU). 


The improvements in the PCA retrievals are likely due to 
the use of more wavelengths and better characterization of 
orbit-to-orbit measurement artifacts (e.g., due to small changes 
in uncorrected detector dark currents). 

4. Discussion and Future Work 

[16] In summary, we have developed a new S0 2 retrieval 
algorithm based on principal component analysis of satel- 
lite-measured radiance data. Preliminary application of the 
new algorithm to OMI suggests that it can greatly reduce sys- 
tematic biases in the current operational OMI PBL SO 2 data, 
and it suppresses the retrieval noise by a factor of 2. Our ap- 
proach takes advantage of the fact that usually only a small 
portion of each satellite orbit has discernible SO 2 absorption 
signals, and data from the rest of the orbit can be used to char- 
acterize and extract other physical and measurement details. 
While also relying on the least squares fitting of the measured 
radiances, our method differs from the DOAS approach in 
that its forward model contains basis functions mostly de- 
rived from the data, instead of various precalculated refer- 
ence spectra. This decreases the uncertainties associated 
with modeling and instrumental errors and speeds up the cal- 
culation. With much less computation required, the new PCA 
algorithm is much faster than the full spectral fit and requires 
only about 4-5 min to process an entire OMI orbit using a 
single state-of-the-art CPU. 

[17] Another advantage of our PCA-based algorithm is that 
it largely eliminates the need to develop specific, empirical 


Table 1. The Statistics of the PCA-Retrieved and the OMI Operational PBL S0 2 Over the Equatorial Pacific (10°S-10°N, 120°W-150°W) 
in August 2006 a 


Date b 

Number of Pixels 

PCA S0 2 Mean (DU) 

PCA S0 2 SD (DU) 

Operational Mean (DU) 

Operational SD (DU) 

08/01 

10034 

-0.002 

0.511 

0.059 

0.949 

08/06 

8823 

0.006 

0.512 

0.010 

0.994 

08/11 

7056 

0.017 

0.501 

-0.033 

1.015 

08/16 

7007 

0.030 

0.507 

-0.053 

0.945 

08/21 

8299 

-0.009 

0.486 

-0.043 

0.937 

08/26 

9538 

-0.017 

0.498 

0.000 

0.952 

08/31 

9838 

-0.012 

0.504 

0.060 

0.946 

Range c 


-0.020 to 0.030 

0.484 to 0.561 

-0.140 to 0.094 

0.929 to 1.064 


a Data outside the center 50 rows, or with radiative cloud fraction > 0.3 or slant column 0 3 > 1500 DU excluded. 
“Dates are formatted as month/day. 

“Minimal and maximal values for the entire month. 
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corrections to the radiance data for each instrument. Rather, 
measurement artifacts are accounted for by the PCs directly 
extracted from the radiance data. This reduces the potential 
artifacts/biases introduced by instrument-specific data cor- 
rection schemes. The algorithm can be easily adapted to other 
satellite sensors, and this feature makes it particularly useful 
for building long-term, consistent S0 2 data records. In fact, 
we have tested the algorithm on the Ozone Mapping and 
Profiler Suite (OMPS) nadir mapping instrument fly- 
ing on the Suomi National Polar-orbiting Partnership 
satellite. Using the algorithm with minimal changes (the only 
major change being the use of instrument-specific slit func- 
tions for SO 2 Jacobians), we achieved very consistent, high 
quality S0 2 retrievals from both OMI and OMPS. 

[is] Next, we plan to expand the calculations of S0 2 
Jacobians to account for different viewing geometries, sur- 
face albedo, and 0 3 and S0 2 profiles. This is expected to 
further reduce retrieval noise and biases especially for 
oceanic regions. We will also more thoroughly evaluate 
the data quality, including an analysis of error propagation 
to estimate retrieval errors due to measurement noise. For 
data validation, the PCA retrievals will also be compared 
to existing airborne S0 2 measurements over the U.S. and 
China, as well as other data sources. Finally, we will inves- 
tigate the possibility of applying the algorithm to other 
trace gas species. Some trace gases (e.g., FICHO) have 
fairly inhomogeneous spatial distributions similar to S0 2 
and could be suitable for the approach. 

[ 19 ] Acknowledgments. We acknowledge the NASA Earth Science 
Division for funding of OMI SO 2 product development and analysis. The 
Dutch-Finnish-built OMI instrument is part of the NASA EOS Aura satellite 
payload. The OMI instrument is managed by KNMI and the Netherlands 
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